library(fpp3)5_3_Classical_Decomposition
References
NOTE: this material is not original, but rather an extension of the following reference, which is publicly available.
- Hyndman, R.J., & Athanasopoulos, G., 2021. Forecasting: principles and practice. 3rd edition. Link
Packages
Classical decomposition:
It is the starting point for most other methods of time series decomposition. In addition, it is simple enough that it can be easily implemented from scratch, which is something we are going to do in class.
Essentially are 2 variants of classical decomposition:
- Classical decomposition of aditive schemes
- Classical decomposition of multiplicative schemes
The algorithm depends on whether the length seasonal period is even or uneven. Examples:
- If we have daily data we may consider, for example, a period of one week. That seasonal period would be of length 7 (7 days).
- NOTE: For daily data we could also consider other seasonal periods: one month, one quarter… even one year.
- If we have monthly data, we could consider a yearly seasonal period. That seasonal period would be of length 12 (12 months).
- If we have quarterly data, we could consider a yearly seasonal period. That seasonal period would be of length 4 (4 quarters).
- …
Because the algorithm approximates the trend with a centered moving average, the even or uneven nature of the seasonal period is important. See the separate notebook on moving averages, which has an associated video to understand this:
- Even seasonal period (\(m\) even): trend approximated by a 2x\(m\) moving average.
- Uneven seasonal period \(m\): trend approximated by an \(m\) moving average.
Additive schemes - classical decomposition
Step 1 - Estimate trend
- If \(m\) is an even number, compute the trend-cycle component \(T_t\) using 2 x \(m\)-MA.
- If \(m\) is an odd number compute the trend-cycle component \(T_t\) using an \(m\)-MA.
Output: the trend component \(T_t\)
Step 2 - Detrended series
Calculate the de-trended series: \(D_t = y_t - T_t\)
Output: the detrended series \(D_t\)
Step 3 - Compute seasonal component
Main assumption to compute the seasonal component: the seasonal component remains constant over time. This is one of the main weaknesses of the classical decomposition.
Step 3.1. Average the de-trended values for each component of the season.
Example: with monthly data the seasonal component for march is the average of all the detrended March values in the data
Output: a vector of length \(m\), with \(m\) being the length of the seasonal compoent. + Let us call this vector \(S_unadj\) for unadjusted seasonal component
Step 3.2. Adjust \(S_{unadj}\), the output of step 3.1 to ensure that the sum of its values add to 0.
3.2.1 Sum all the components of the vector \(S_{unadj}\). Lets call this output \(a\). This result is the total deviation from 0 we want to correct.
3.2.2 Divide the previous result \(a\) by the length of the seasonal period \(m\). Then subtract that amount \(a/m\) from every component of \(S_{unadj}\)
Output: the seasonal component \(S\).
By doing this the seasonal component will not contribute to the mean value of the time series over one period. That is, the average of the seasonal component computed in this manner over one period will be 0.
\[ \text{condition imposed: }\sum_{i=1}^{m} S_i= 0 \]
As a result, the seasonal component does not contribute to the average of the time series over one period
\[ \frac{\sum_{i=1}^my_i}{m} = \frac{\sum_{i=1}^m(T_i + S_i + R_i)}{m} \underset{\underset{\sum_{i=1}^mS_i = 0}{\uparrow}}{=} \frac{\sum_{i=1}^m(T_i + R_i)}{m} \]
Step 4 - Compute the remainder component
\(R_t = D_t - S_t = y_t - T_t - S_t\)
Example: computation using the function classical_decomposition():
# Filter the series and select relevant columns
us_retail_employment <- us_employment %>%
filter(year(Month) >= 1990, Title == "Retail Trade") %>%
select(-Series_ID)
# Let us compute the classical decomposition using the function in the feasts library:
classical_dec <- us_retail_employment %>%
# Fit the classical decomposition model specifying an additive scheme
model(
clas_deic = classical_decomposition(Employed, type = "additive")
) %>%
# Extract components from the fitted model
components()
select(classical_dec, Month, Employed, trend, seasonal, random)# A tsibble: 357 x 5 [1M]
Month Employed trend seasonal random
<mth> <dbl> <dbl> <dbl> <dbl>
1 1990 Jan 13256. NA -75.5 NA
2 1990 Feb 12966. NA -273. NA
3 1990 Mar 12938. NA -253. NA
4 1990 Apr 13012. NA -190. NA
5 1990 May 13108. NA -88.9 NA
6 1990 Jun 13183. NA -10.4 NA
7 1990 Jul 13170. 13178. -13.3 5.65
8 1990 Aug 13160. 13161. -9.99 8.80
9 1990 Sep 13113. 13141. -87.4 59.9
10 1990 Oct 13185. 13117. 34.6 33.8
# ℹ 347 more rows
classical_dec %>%
autoplot()Warning: Removed 6 rows containing missing values (`geom_line()`).
Example: manual computation implementing the described algorithm.
Step 1: compute the trend using moving averages
# Compute moving averages to estimate the trend
# Because the seasonal period is 12, we need a 2x12-MA
manual_decomposition <- us_retail_employment %>%
mutate(
`12-MA` = slider::slide_dbl(Employed, mean,
.before = 5, .after = 6, .complete = TRUE),
`2x12-MA` = slider::slide_dbl(`12-MA`, mean,
.before = 1, .after = 0, .complete = TRUE)
)
# Plot the computed trend
manual_decomposition %>%
autoplot(Employed, colour = "gray") +
geom_line(aes(y = `2x12-MA`), colour = "#D55E00") +
labs(y = "Persons (thousands)",
title = "Total employment in US retail")Warning: Removed 12 rows containing missing values (`geom_line()`).
Step 2 - detrend the time series
# Compute the detrended component:
manual_decomposition <-
manual_decomposition %>%
mutate(
detrended = Employed - `2x12-MA`,
)
manual_decomposition %>% autoplot(detrended)Warning: Removed 12 rows containing missing values (`geom_line()`).
This is the resulting detrended component, which contains the seasonal + remainder component: \(D_t = y_t - T_t = S_t + R_t\)
Step 3: compute the seasonal component
Step 3.1. Compute \(S_{unadj}\) averaging the de-trended values for each element of the season:
We will store the result in a separate dataframe called df_seasonal. This dataframe will contain 2 columns:
- An identifier for the number of the month. This identifier will be the same for every month, regardless of the year. More generally, we need to compute an identifier for each element of the seasonal component.
- We will use this identifier to group the data and compute the mean value
s_unadj: we compute it averaging the detrended series for each element of the seasonal component. For example, with monthly data, the seasonal component for March is the average of all the detrended March values in the data
We will start by creating the mentioned identifier for each month (regardless of the year) on our dataframe holding the decomposition:
manual_decomposition <-
manual_decomposition %>%
# Compute an identifier for each month, regardless
# of the year. This will define the subsequent groups
mutate(
n_month = month(Month)
)
manual_decomposition# A tsibble: 357 x 7 [1M]
Month Title Employed `12-MA` `2x12-MA` detrended n_month
<mth> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1990 Jan Retail Trade 13256. NA NA NA 1
2 1990 Feb Retail Trade 12966. NA NA NA 2
3 1990 Mar Retail Trade 12938. NA NA NA 3
4 1990 Apr Retail Trade 13012. NA NA NA 4
5 1990 May Retail Trade 13108. NA NA NA 5
6 1990 Jun Retail Trade 13183. 13186. NA NA 6
7 1990 Jul Retail Trade 13170. 13170. 13178. -7.66 7
8 1990 Aug Retail Trade 13160. 13151. 13161. -1.20 8
9 1990 Sep Retail Trade 13113. 13130. 13141. -27.5 9
10 1990 Oct Retail Trade 13185. 13103. 13117. 68.5 10
# ℹ 347 more rows
Now we will use this newly created identifier to compute the averages of the detrended time series for each month regardless of the year. Expressed in more general terms, we are computing the averages of the detrended time series for each element of the seasonal component.
df_seasonal <-
manual_decomposition %>%
# turn into a tibble to be able to use group_by
# without the limitations on a tsibble
as_tibble() %>%
# group by the created identifier
group_by(n_month) %>%
# compute the average seasonal component for every month
summarise(
s_unadj = mean(detrended, na.rm = TRUE)
) %>%
ungroup() %>%
# Turn back into a tsibble
as_tsibble(index = n_month)
df_seasonal# A tsibble: 12 x 2 [1]
n_month s_unadj
<dbl> <dbl>
1 1 -75.5
2 2 -273.
3 3 -253.
4 4 -190.
5 5 -88.9
6 6 -10.4
7 7 -13.3
8 8 -10.0
9 9 -87.4
10 10 34.6
11 11 394.
12 12 573.
#Check the aspect of the obtained unadjusted seasonal component
df_seasonal %>%
autoplot(s_unadj) +
scale_x_continuous(breaks = seq(1, 12),
minor_breaks = seq(1, 12)) +
geom_point(color = "red")This is the resulting unadjusted seasonal component \(S_{unadj}\):
- The first point is the unadjusted value of the seasonal component for January.
- The second point is the unadjusted value of the seasonal component for February.
- The third point is the unadjusted value of the seasonal component for March.
- …
Step 3.2 Adjust the seasonal component
Let us check if the values of the component add up to 0 as we wanted to impose:
#The sum does not add up to 0
a = sum(df_seasonal$s_unadj)
a[1] -0.2273091
We can see that the sum is not 0. We will divide the sum by the length of the seasonal component (in this case \(m\) = 12) and then distribute this deviation from 0 evenly over the seasonal component.
df_seasonal <-
df_seasonal %>%
# Correction so that the sum of the seasoan components is 0
mutate(
seasonal_c = s_unadj - a/12
) %>%
# Drop s_unadj since we do not need it anymore.
select(n_month, seasonal_c)
sum(df_seasonal$seasonal_c)[1] 1.136868e-13
As you can see, the seasonal component is now 0. Great. Let us bring the seasonal component back into our original dataset. For this we will use a join operation. Remember that we created previously a column called n_month that now will come in handy when doing this join:
# Bring seasonal component into the dataset (I chose to use a "join" statement, very convenient)
manual_decomposition <-
left_join(manual_decomposition, df_seasonal, by = "n_month")
manual_decomposition# A tsibble: 357 x 8 [1M]
Month Title Employed `12-MA` `2x12-MA` detrended n_month seasonal_c
<mth> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1990 Jan Retail Trade 13256. NA NA NA 1 -75.5
2 1990 Feb Retail Trade 12966. NA NA NA 2 -273.
3 1990 Mar Retail Trade 12938. NA NA NA 3 -253.
4 1990 Apr Retail Trade 13012. NA NA NA 4 -190.
5 1990 May Retail Trade 13108. NA NA NA 5 -88.9
6 1990 Jun Retail Trade 13183. 13186. NA NA 6 -10.4
7 1990 Jul Retail Trade 13170. 13170. 13178. -7.66 7 -13.3
8 1990 Aug Retail Trade 13160. 13151. 13161. -1.20 8 -9.99
9 1990 Sep Retail Trade 13113. 13130. 13141. -27.5 9 -87.4
10 1990 Oct Retail Trade 13185. 13103. 13117. 68.5 10 34.6
# ℹ 347 more rows
Step 4: compute the remainder component removing the seasonal component from the detrended series
Now that we have the seasonal component, we obtain the remainder component by subtracting it from the de-trended series.
# Compute remainder component:
manual_decomposition <-
manual_decomposition %>%
mutate(
remainder = detrended - seasonal_c
)
select(manual_decomposition, Month, `2x12-MA`, seasonal_c, remainder)# A tsibble: 357 x 4 [1M]
Month `2x12-MA` seasonal_c remainder
<mth> <dbl> <dbl> <dbl>
1 1990 Jan NA -75.5 NA
2 1990 Feb NA -273. NA
3 1990 Mar NA -253. NA
4 1990 Apr NA -190. NA
5 1990 May NA -88.9 NA
6 1990 Jun NA -10.4 NA
7 1990 Jul 13178. -13.3 5.65
8 1990 Aug 13161. -9.99 8.80
9 1990 Sep 13141. -87.4 59.9
10 1990 Oct 13117. 34.6 33.8
# ℹ 347 more rows
We have all the components. We are now going to compute the maximum absolute errors of our manual computation, comparing it with the result provided by the function classical_decomposition()
# Evaluation of absolute errors
max_err_seasonal = max(classical_dec$seasonal - manual_decomposition$seasonal_c, na.rm = TRUE)
max_err_trend = max(classical_dec$trend - manual_decomposition$`2x12-MA`, na.rm = TRUE)
max_err_rand = max(classical_dec$random - manual_decomposition$remainder, na.rm = TRUE)
# Compute the max value of all these vectors
max(c(max_err_seasonal, max_err_trend, max_err_rand))[1] 6.025402e-12
QUESTION Can you provide an interpretation of this result?
The result is not 0 but it is sufficiently close to 0 that we can attribute this deviation to numerical errors (particularly given the fact that this error is much smaller than any of the components we want to compute).
Extra: generate a plot of the components we just created
# As an additional ggplot visualization exercise, below the code to visualize
# the decomposition we have computed manually
# Define levels for the factor variable to ensure proper ordering
# in the facet grid figure.
factor_levels = c("Employed", "trend", "seasonal_c", "remainder")
manual_decomposition_long <-
manual_decomposition %>%
# Rename so that the labels are consistent
rename(trend = `2x12-MA`) %>%
# Select the columns of interest
select(Month, Employed, trend, seasonal_c, remainder) %>%
# Generate long format based on a single variable so that we can
# facet_grid following this variable
pivot_longer(cols = Employed:remainder, names_to = "component") %>%
# Convert to factor to ensure proper ordering in the figure
mutate(
component = factor(component, levels=factor_levels)
)
# Now generate the graph
manual_decomposition_long %>%
ggplot(aes(x = Month, y = value)) +
geom_line() +
# scales = "free_y" so that each component has an adapted y-axis range
facet_grid(component ~ ., scales = "free_y")Warning: Removed 6 rows containing missing values (`geom_line()`).
We can see that it looks just like the decomposition returned by the fable library.
Multiplicative scheme - classical decomposition
In this case substractions are simply replaced by divisions
Step 1 - Estimate trend
If \(m\) is an even number, compute the trend-cycle component \(\hat{T_t}\) using 2 x \(m\)-MA. If \(m\) is an odd number compute the trend-cycle component \(\hat{T_t}\) using an \(m\)-MA.
Step 2 - Detrended series
Calculate the de-trended series: \(y_t / \hat{T_t}\)
Step 3 - Compute seasonal component
We will make the underlying assumption that the seasonal component remains the same for every season. This is one of the main weaknesses of the classical decomposition.
Average the de-trended values for each component of the season.
- Example: with monthly data the seasonal component for march is the average of all the detrended March values in the data
Adjust the seasonal component values to ensure they add to \(m\). This ensures that the average of the seasonal component over one period is 1.
2.1 Sum the seasonal components of everymonth. The result is the total deviation from 0 of the sum of the seasonal components.
2.2 Divide the previous result by the seasonal period and substract that amount from each seasonal component.
Since the seasonal components add to \(m\), the average of the time series over one period is a weighted average of the product of the trend and the remainder components, with the seasonal components divided by \(m\) as weights that add to 1.In this case the weighted average is not symmetrical. If \(m\) is the length of the seasonal period we have:
\[ \sum_{i=1}^m{S_i} = m \]
\[ \frac{\sum_{i=1}^m(T_i S_i R_i)}{m} = \underbrace{\sum_{i=1}^m\frac{Si}{m}(T_i R_i) }_{weighted\;mov.\;av.\\ of\;R_iT_i\;since\sum_{i=1}^mS_i/m = 1} \]
Step 4 - Compute the remainder component
\(\hat{R} = y_t /\hat{T_t}\hat{S_t}\)
Seasonally adjusted data:
The seasonally adjusted data is the reslt of removing the seasonal component from the original data
- Additive decomposition: \(y_t - St\)
- Multiplicative decomposition: \(y_t / S_t\)
It contains **the rmainder as well as the trend-cucle component*. Therefore they are not “smooth”. “Downturns” or “upturms” can be misleading.
- If you want to look for turning poins and interpret changes in direction, it is best to use the trend cycle component rather than the seasonally adjusted data
The seasonally adjusted data is useful if the variation due to seasonality is not of primary interest.
- Example: unemployment data are usually seasonally adjusted in order to highlight variation due to the underlying state of the economy rather than the seasonal variation
classical_dec %>%
as_tsibble() %>%
autoplot(Employed, colour = "gray") +
geom_line(aes(y=season_adjust), colour = "#0072B2") +
labs(y = "Persons (thousands)",
title = "Total employment in US retail")Exercise 1
Consider the last five years of the Gas data from aus_production
gas <- tail(aus_production, 5*4) %>% select(Gas)
gas# A tsibble: 20 x 2 [1Q]
Gas Quarter
<dbl> <qtr>
1 221 2005 Q3
2 180 2005 Q4
3 171 2006 Q1
4 224 2006 Q2
5 233 2006 Q3
6 192 2006 Q4
7 187 2007 Q1
8 234 2007 Q2
9 245 2007 Q3
10 205 2007 Q4
11 194 2008 Q1
12 229 2008 Q2
13 249 2008 Q3
14 203 2008 Q4
15 196 2009 Q1
16 238 2009 Q2
17 252 2009 Q3
18 210 2009 Q4
19 205 2010 Q1
20 236 2010 Q2
- Use
classical_decompositionwithtype=multiplicativeto calculate the trend-cycle and seasonal indices
# Let us compute the classical decomposition using the function in the feasts library:
classical_dec <- gas %>%
model(
classical_decomposition(Gas ~ season(4), type = "multiplicative")
) %>%
components()
classical_dec# A dable: 20 x 7 [1Q]
# Key: .model [1]
# : Gas = trend * seasonal * random
.model Quarter Gas trend seasonal random season_adjust
<chr> <qtr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 "classical_decomposition(G… 2005 Q3 221 NA 1.13 NA 196.
2 "classical_decomposition(G… 2005 Q4 180 NA 0.925 NA 195.
3 "classical_decomposition(G… 2006 Q1 171 200. 0.875 0.974 195.
4 "classical_decomposition(G… 2006 Q2 224 204. 1.07 1.02 209.
5 "classical_decomposition(G… 2006 Q3 233 207 1.13 1.00 207.
6 "classical_decomposition(G… 2006 Q4 192 210. 0.925 0.987 208.
7 "classical_decomposition(G… 2007 Q1 187 213 0.875 1.00 214.
8 "classical_decomposition(G… 2007 Q2 234 216. 1.07 1.01 218.
9 "classical_decomposition(G… 2007 Q3 245 219. 1.13 0.996 218.
10 "classical_decomposition(G… 2007 Q4 205 219. 0.925 1.01 222.
11 "classical_decomposition(G… 2008 Q1 194 219. 0.875 1.01 222.
12 "classical_decomposition(G… 2008 Q2 229 219 1.07 0.974 213.
13 "classical_decomposition(G… 2008 Q3 249 219 1.13 1.01 221.
14 "classical_decomposition(G… 2008 Q4 203 220. 0.925 0.996 219.
15 "classical_decomposition(G… 2009 Q1 196 222. 0.875 1.01 224.
16 "classical_decomposition(G… 2009 Q2 238 223. 1.07 0.993 222.
17 "classical_decomposition(G… 2009 Q3 252 225. 1.13 0.994 224.
18 "classical_decomposition(G… 2009 Q4 210 226 0.925 1.00 227.
19 "classical_decomposition(G… 2010 Q1 205 NA 0.875 NA 234.
20 "classical_decomposition(G… 2010 Q2 236 NA 1.07 NA 220.
- Compute and plot the seasonally adjusted data.
classical_dec %>%
as_tsibble() %>%
autoplot(Gas, colour = "gray") +
geom_line(aes(y=season_adjust), colour = "#0072B2") +
labs(y = "Gas production (Petajoules)",
title = "Australian gas production")- Compute manually the classical decomposition of the series (as we did in class, but keep in mind that this scheme is multiplicative and that now the seasonal period is 4 instead of 12).
### 1. Compute the trend with a 2x4 MA
# Compute moving averages
# Because the seasonal period is 12, we need a 2x12-MA
manual_decomposition <- gas %>%
mutate(
`4-MA` = slider::slide_dbl(Gas, mean,
.before = 1, .after = 2, .complete = TRUE),
`2x4-MA` = slider::slide_dbl(`4-MA`, mean,
.before = 1, .after = 0, .complete = TRUE)
)
# Plot the computed trend
manual_decomposition %>%
autoplot(Gas, colour = "gray") +
geom_line(aes(y = `2x4-MA`), colour = "#D55E00") +
labs(y = "Gas production (Petajoules)",
title = "Australian gas production")Warning: Removed 4 rows containing missing values (`geom_line()`).
### 2. Compute the de-trended component:
# Compute the detrended component:
manual_decomposition <- manual_decomposition %>%
mutate(detrended = Gas / `2x4-MA`,
n_quarter = quarter(Quarter)) # Add an identifier for the month (used later for a join statement)
manual_decomposition %>% autoplot(detrended)Warning: Removed 4 rows containing missing values (`geom_line()`).
### 3. Compute the seasonal component grouping and averaging the detrended component
res <- manual_decomposition %>%
index_by(n_quarter) %>%
#Compute seasonal component
summarise(
seasonal_c = mean(detrended, na.rm = TRUE)
)
res# A tsibble: 4 x 2 [1]
n_quarter seasonal_c
<int> <dbl>
1 1 0.875
2 2 1.07
3 3 1.13
4 4 0.925
#Check the aspect of the obtained seasonal component
res %>%
autoplot(seasonal_c)### 4. Correct the seasonal components so that they add to m = 4
m = 4
sum_dev = m - sum(res$seasonal_c)
res <- res %>%
# Correction so that the sum of the seasoan components is 0
mutate(seasonal_c = seasonal_c + sum(sum_dev)/m)
sum(res$seasonal_c)[1] 4
#Check the aspect of the obtained seasonal component
res %>%
autoplot(seasonal_c)# Bring seasonal component into the dataset
# (I chose to use a "join" statement, very convenient)
manual_decomposition <- left_join(manual_decomposition, res, by = "n_quarter")
manual_decomposition# A tsibble: 20 x 7 [1Q]
Gas Quarter `4-MA` `2x4-MA` detrended n_quarter seasonal_c
<dbl> <qtr> <dbl> <dbl> <dbl> <int> <dbl>
1 221 2005 Q3 NA NA NA 3 1.13
2 180 2005 Q4 199 NA NA 4 0.925
3 171 2006 Q1 202 200. 0.853 1 0.875
4 224 2006 Q2 205 204. 1.10 2 1.07
5 233 2006 Q3 209 207 1.13 3 1.13
6 192 2006 Q4 212. 210. 0.913 4 0.925
7 187 2007 Q1 214. 213 0.878 1 0.875
8 234 2007 Q2 218. 216. 1.08 2 1.07
9 245 2007 Q3 220. 219. 1.12 3 1.13
10 205 2007 Q4 218. 219. 0.937 4 0.925
11 194 2008 Q1 219. 219. 0.887 1 0.875
12 229 2008 Q2 219. 219 1.05 2 1.07
13 249 2008 Q3 219. 219 1.14 3 1.13
14 203 2008 Q4 222. 220. 0.921 4 0.925
15 196 2009 Q1 222. 222. 0.883 1 0.875
16 238 2009 Q2 224 223. 1.07 2 1.07
17 252 2009 Q3 226. 225. 1.12 3 1.13
18 210 2009 Q4 226. 226 0.929 4 0.925
19 205 2010 Q1 NA NA NA 1 0.875
20 236 2010 Q2 NA NA NA 2 1.07
# Compute remainder component:
manual_decomposition <- manual_decomposition %>%
mutate(remainder = detrended / seasonal_c)
select(manual_decomposition, Quarter, Gas, `2x4-MA`, seasonal_c, remainder)# A tsibble: 20 x 5 [1Q]
Quarter Gas `2x4-MA` seasonal_c remainder
<qtr> <dbl> <dbl> <dbl> <dbl>
1 2005 Q3 221 NA 1.13 NA
2 2005 Q4 180 NA 0.925 NA
3 2006 Q1 171 200. 0.875 0.974
4 2006 Q2 224 204. 1.07 1.02
5 2006 Q3 233 207 1.13 1.00
6 2006 Q4 192 210. 0.925 0.987
7 2007 Q1 187 213 0.875 1.00
8 2007 Q2 234 216. 1.07 1.01
9 2007 Q3 245 219. 1.13 0.996
10 2007 Q4 205 219. 0.925 1.01
11 2008 Q1 194 219. 0.875 1.01
12 2008 Q2 229 219 1.07 0.974
13 2008 Q3 249 219 1.13 1.01
14 2008 Q4 203 220. 0.925 0.996
15 2009 Q1 196 222. 0.875 1.01
16 2009 Q2 238 223. 1.07 0.993
17 2009 Q3 252 225. 1.13 0.994
18 2009 Q4 210 226 0.925 1.00
19 2010 Q1 205 NA 0.875 NA
20 2010 Q2 236 NA 1.07 NA
# Evaluation of absolute errors
max_err_seasonal = max(classical_dec$seasonal - manual_decomposition$seasonal_c, na.rm = TRUE)
max_err_trend = max(classical_dec$trend - manual_decomposition$`2x4-MA`, na.rm = TRUE)
max_err_rand = max(classical_dec$random - manual_decomposition$remainder, na.rm = TRUE)
max(c(max_err_seasonal, max_err_trend, max_err_rand))[1] 3.720274e-06
The errors are small enough for us to consider that we have correctly
implemented the algorithm in the function `classical_decomposition()`.
It would be interesting to explore the specific reasons for this deviation,
but that is something we will not do.- Change one observations to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
gas %>%
mutate(Gas = if_else(Quarter == yearquarter("2007Q4"), Gas + 300, Gas)) %>%
model(decomp = classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
as_tsibble() %>%
autoplot(season_adjust) +
labs(title = "Seasonally adjusted data", y = "Petajoules")* The outlier is more or less in the middle of the series.
* The trend is approximated using a 2x4-MA. Since the outlier is in the middle
of the series, this information is in the region where the moving average can
be computed and the information of the outlier is captured by the trend component.
* The detrended component also inherits the information of the outlier,
since it is computed as $y_t/T_t$.
* The seasonal component is computed averaging this detrended component and
therefore is also affected by the outlier. It will not accurately approximate
the seasonal component.- Does it make any difference if the outlier is near the end rather than in the middle of the series?
As a result the seasonally adjusted data still has seasonality, because we have
not adequately captured the seasonal component.- Does it make any difference if the outlier is near the end rather than in the middle of the series?
gas %>%
mutate(Gas = if_else(Quarter == yearquarter("2010Q2"), Gas + 300, Gas)) %>%
model(decomp = classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
as_tsibble() %>%
autoplot(season_adjust) +
labs(title = "Seasonally adjusted data", y = "Petajoules")* The outlier is located at the last point of the series.
* The trend is approximated using a 2x4-MA. Since the outlier is at the very
end of the series, this information is in the region where the 2x4 moving
average cannot be computed and the information of the outlier is not captured
by the trend component.
* The detrended component is defined in exactly the same region as the trend
approximation, since it is computed as $y_t/T_t$. Therefore the detrended
component does not capture the information about the outlier.
* The seasonal component is computed averaging this detrended component.
Therefore it does not inherit information from the outlier. The seasonal
component is not affected by the presence of this outlier.Exercise 2
- Take the dataset
vic_elecand aggregate the data to obtain the daily demand usingindex_by(). The command below computes the classical decomposition considering a period of 1 week (instead of 1 year) having filtered for the year 2012.
vic_elec_w <-
vic_elec %>%
index_by(Date) %>%
summarize(
avg_demand = mean(Demand, na.rm = TRUE)
) %>%
filter(year(Date) == 2012)
vic_elec_w_dcmp <-
vic_elec_w %>%
model(
decomp = classical_decomposition(avg_demand ~ season(period = 7),
type = "additive")
) %>%
components()
vic_elec_w_dcmp %>% autoplot()Warning: Removed 3 rows containing missing values (`geom_line()`).
Compute manually this decomposition (no need to depict it, just compute the values within vic_elec_w_dcmp manually).
# Compute moving averages to estimate the trend
# Because the seasonal period is 12, we need a 2x12-MA
manual_decomposition <- vic_elec_w %>%
mutate(
`7-MA` = slider::slide_dbl(avg_demand, mean,
.before = 3, .after = 3, .complete = TRUE)
)
# Plot the computed trend
manual_decomposition %>%
autoplot(avg_demand, colour = "gray") +
geom_line(aes(y = `7-MA`), colour = "#D55E00") +
labs(y = "average demand (MWh)",
title = "Average weekly electricity demand")Warning: Removed 6 rows containing missing values (`geom_line()`).
Let us do a first sanity check and compare the trend computed by the function
and the resulting of our moving averagetrend_func <- vic_elec_w_dcmp$trend
trend_man <- manual_decomposition$`7-MA`
# Absolute errors at each point in time of the trend
abs_err <- abs(vic_elec_w_dcmp$trend - manual_decomposition$`7-MA`)
# Biggest numerical difference
# of order 1E-12 -> numerical noise, look at the order of magnitude
max(abs_err, na.rm = TRUE)[1] 1.818989e-12
# Lets round to the 3rd decimal (accuracy of up to KWh) and then
# check equality
# This sum equals 0, which means that they are the same at least down
# to the third decimal (give it some thought)
sum(!(round(trend_func, 3) == round(trend_man, 3)))[1] NA
# Alternative way to check this in R.
# This returns TRUE, meaning they are the same at least
# down to the third decimal
identical(round(trend_func, 3) , round(trend_man, 3))[1] TRUE
# Compute the detrended component:
manual_decomposition <- manual_decomposition %>%
mutate(
# Compute detrended component
detrended = avg_demand - `7-MA`,
# Identifier added for the day of the week
wday = wday(Date, label = FALSE, abbr = FALSE)
)
# Plot detrended component (additive + seasonal)
manual_decomposition %>% autoplot(detrended)Warning: Removed 6 rows containing missing values (`geom_line()`).
# Include the seasonal component
res <- manual_decomposition %>%
index_by(wday) %>%
#Compute seasonal component
summarise(
seasonal_c = mean(detrended, na.rm = TRUE)
) %>%
ungroup()
res# A tsibble: 7 x 2 [1]
wday seasonal_c
<dbl> <dbl>
1 1 -548.
2 2 121.
3 3 189.
4 4 203.
5 5 233.
6 6 160.
7 7 -380.
#Check the aspect of the obtained seasonal component
res %>%
autoplot(seasonal_c) +
scale_x_continuous(breaks = seq(1, 7),
minor_breaks = seq(1, 7)) +
geom_point(color = "red")Now we make the necessary correction so that all the sum of the seasonal
component over an entire season = 0corr <- sum(res$seasonal_c) / 7
res <-
res %>%
mutate(
seasonal_c = seasonal_c - corr
)# Bring back our seasonal component to the time series
manual_decomposition <-
manual_decomposition %>%
left_join(res, by = "wday")Let us introduce a new sanity check and check that our seasonal component
matches the component from the function `classical_decomposition()`seasonal_func <- round(vic_elec_w_dcmp$seasonal, 3)
seasonal_man <- round(manual_decomposition$seasonal_c, 3)
# This sum is 0, meaning that both vectors coincide at least down to
# 3 decimals. Give this some thought to understand it.
sum(!(seasonal_func == seasonal_man))[1] 0
# Max absolute error of order 1E-14
max(vic_elec_w_dcmp$seasonal - manual_decomposition$seasonal_c)[1] 5.684342e-14
# Finally, compute the remainder component
manual_decomposition <-
manual_decomposition %>%
mutate(
remainder = detrended - seasonal_c
)# Last check - compare both remainders (a.k.a random)
remainder_func <- round(vic_elec_w_dcmp$random, 3)
remainder_man <- round(manual_decomposition$remainder, 3)
# This sum is 0, meaning that both vectors coincide at least down to
# 3 decimals. Give this some thought to understand it.
sum(!(remainder_func[!is.na(remainder_func)] == remainder_man[!is.na(remainder_man)]))[1] 0
# As an additional ggplot visualization exercise, below the code to visualize
# the decomposition we have computed manually
# Define levels for the factor variable to ensure proper ordering
# in the facet grid figure.
factor_levels = c("avg_demand", "trend", "seasonal_c", "remainder")
manual_decomposition_long <-
manual_decomposition %>%
# Rename so that the labels are consistent
rename(trend = `7-MA`) %>%
# Select the columns of interest
select(Date, avg_demand, trend, seasonal_c, remainder) %>%
# Generate long format based on a single variable so that we can
# facet_grid following this variable
pivot_longer(cols = avg_demand:remainder, names_to = "component") %>%
# Convert to factor to ensure proper ordering in the figure
mutate(
component = factor(component, levels=factor_levels)
)
# Now generate the graph
manual_decomposition_long %>%
ggplot(aes(x = Date, y = value)) +
geom_line() +
# scales = "free_y" so that each component has an adapted y-axis range
facet_grid(component ~ ., scales = "free_y")Warning: Removed 3 rows containing missing values (`geom_line()`).
Comments on classical decomposition:
Classical decomposition is still widely used, but it is not recommended. It is however the foundation for many of the other methods and understanding it is important. Some problems with classical decomposition are summarised below: